Fix Nokogiri parser processing xml entity encoded characters #64

tinbka · 2014-09-08T12:33:53Z

In Nori 2.4.0 there is a bug in a parser when it processes xml entity encoded characters
that is a common case for non-English SOAP applications.

For example, if we have the next node:

<ns0:ASSIGNEDGROUP>&#x414;&#x435;&#x436;&#x443;&#x440;&#x43D;&#x430;&#x44F; &#x441;&#x43C;&#x435;&#x43D;&#x430; &#x41B;&#x41F;1.5</ns0:ASSIGNEDGROUP>

and we want to convert it to hash using Nori with Nokogiri parser, it is expected to be:

{"ns0:ASSIGNEDGROUP"=>"Дежурная смена ЛП1.5"}

but current version of Nori converts it to:

{"ns0:ASSIGNEDGROUP"=>"ДежурнаясменаЛП1.5"}

since every whitespace in source xml lies between two other nodes and looks like a non-significant crap.

The present patch fixes this case, still not touching behaviour in any other way.

Fix Nokogiri parser processing xml entity encoded characters

tjarratt · 2014-09-22T20:02:01Z

Thanks for submitting this pull request @tinbka. Your fix looks great -- my only concern is that we don't have any specs covering this. However, the fix looks rather harmless, so I'm willing to backfill tests at a later time.

Thanks for submitting to Nori!

Fix Nokogiri parser processing xml entity encoded characters

502b989

tjarratt added a commit that referenced this pull request Sep 22, 2014

Merge pull request #64 from tinbka/chars-parsing-fix

3a9cdb6

Fix Nokogiri parser processing xml entity encoded characters

tjarratt merged commit 3a9cdb6 into savonrb:master Sep 22, 2014

AlexanderZagaynov mentioned this pull request Jul 1, 2020

fix 'undefined method for nil:NilClass' on wrong xmls #82

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Nokogiri parser processing xml entity encoded characters #64

Fix Nokogiri parser processing xml entity encoded characters #64

Uh oh!

tinbka commented Sep 8, 2014

Uh oh!

tjarratt commented Sep 22, 2014

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Fix Nokogiri parser processing xml entity encoded characters #64

Fix Nokogiri parser processing xml entity encoded characters #64

Uh oh!

Conversation

tinbka commented Sep 8, 2014

Uh oh!

tjarratt commented Sep 22, 2014

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants